Exploiting information induced from (query-specific) clustering oftop-retrieved documents has long been proposed as a means for improvingprecision at the very top ranks of the returned results. We present a novellanguage model approach to ranking query-specific clusters by the presumedpercentage of relevant documents that they contain. While most previous clusterranking approaches focus on the cluster as a whole, our model utilizes alsoinformation induced from documents associated with the cluster. Our modelsubstantially outperforms previous approaches for identifying clusterscontaining a high relevant-document percentage. Furthermore, using the model toproduce document ranking yields precision-at-top-ranks performance that isconsistently better than that of the initial ranking upon which clustering isperformed. The performance also favorably compares with that of astate-of-the-art pseudo-feedback-based retrieval method.
展开▼